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Abstract 

Background: The minimal genome of the tiny, hyperthermophilic archaeon Nanoarchaeum equitans contains 
several fragmented genes and revealed unusual RNA processing pathways. These include the maturation of tRNA 
molecules via the trans-splicing of tRNA halves and genomic rearrangements to compensate for the absence of 
RNase P. 

Results: Here, the RNA processing events in the N. equitans cell are analyzed using RNA-Seq deep sequencing 
methodology. All tRNA half precursor and tRNA termini were determined and support the tRNA trans-splicing 
model. The processing of CRISPR RNAs from two CRISPR clusters was verified. Twenty-seven C/D box small RNAs 
(sRNAs) and a H/ACA box sRNA were identified. The C/D box sRNAs were found to flank split genes, to form 
dicistronic tRNA-sRNA precursors and to be encoded within the tRNAMet intron. 

Conclusions: The presented data provide an overview of the production and usage of small RNAs in a cell that 
has to survive with a highly reduced genome. N. equitans lost many essential metabolic pathways but maintains 
highly active CRISPR/Cas and rRNA modification systems that appear to play an important role in genome 
fragmentation. 



Background 

Nanoarchaeum equitans is a 400 nm small archaeon iso- 
lated from hot submarine vent microbial communities 
whose growth relies on its attachment to the cell surface 
of the archaeon Ignicoccus hospitalis [1]. Phylogenetic ana- 
lyses based on its unusual ribosomal RNA sequences 
placed N. equitans into a novel phylum termed 'Nanoarch- 
aeota'. However, different phylogenetic studies focused on 
ribosomal proteins and concluded that N. equitans repre- 
sents a member of a fast-evolving euryarchaeal lineage 
related to the Thermococcales [2]. The genome sequence 
of N. equitans Kin4-M revealed a minimal, compact gen- 
ome of only 490 kilobases and an extremely high gene 
density with little noncoding DNA or pseudogenes [3], 
This highly reduced genome lacks almost all known genes 
for the synthesis of amino acids, nucleotides, cofactors, 
and lipids. Conserved operonic structures are absent and 
an unusually high number of genes is found in split varia- 
tions [3,4]. Examples of such splits are the two open read- 
ing frames encoding domains of the alanyl-tRNA 
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synthetase or the reverse gyrase [3]. Other unusual 
features concern the processing of RNA molecules. 
N. equitans was the first organism shown to require the 
assembly of tRNA halves to generate six essential func- 
tional tRNA isoacceptors [5], A heteromeric splicing endo- 
nuclease generates these mature tRNAs via an unusual 
fra«s-splicing reaction [6-8]. N. equitans is also the only 
currently identified organism that can survive without an 
RNase P molecule [9-11]. RNase P is an otherwise univer- 
sal ribonucleoprotein complex that mediates the removal 
of 5' leaders in pre-tRNAs. The absence of both RNA and 
protein components of RNase P is compensated by geno- 
mic rearrangements that resulted in a removal of 5' leader 
sequences from all N. equitans tRNA genes, ensuring 
proper transcription initiation conditions. 

The loss of many essential pathways has to be compen- 
sated by the transfer of metabolites between N. equitans 
and /. hospitalis [12]. It is assumed that direct cell-cell 
surface contacts as well as interconnections via thin 
fibers fulfill this purpose [13]. The N. equitans genome 
encodes a fairly complete set of proteins for replication, 
transcription and translation. In addition, surprisingly 
extensive sets of genes with proposed roles in DNA 
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repair and RNA modification are annotated. Finally, two 
clustered regularly interspaced short palindromic repeats 
(CRISPR) arrays and a complete set of CRISPR associated 
(Cas) proteins are present. These systems are mainly 
characterized as adaptive antiviral defense systems even 
though the viral threat towards N. equitans is not known 
[14,15]. In this study, RNA-Seq deep sequencing metho- 
dology was used to analyze the RNA components 
involved in the processing and maturation of tRNAs, 
rRNAs and CRISPR RNAs (crRNAs) to obtain insights 
into the usage of small RNA molecules in an organism 
that has to survive with a minimal and condensed 
genome. 

Results and discussion 

Abundance of RNA species 

RNA-Seq methodology was used to obtain a global over- 
view of the production and processing of small RNA 
molecules in N. equitans. Depending on the chosen 
method of RNA isolation, library preparation and 
employed RNA sequencing platform, different RNA spe- 
cies are selectively enriched [16]. To obtain the most com- 
plete picture of the small RNA diversity present in the 
N. equitans cell, small RNAs were enriched but not 
depleted of ribosomal RNAs. All RNA samples were trea- 
ted with T4 polynucleotide kinase and tobacco acid pyro- 
phosphatase before adapter ligation. These steps were 
employed to ensure that RNAs with 5'-triphosphate or 
5'-OH termini and 2',3'-cyclic phosphate termini were cap- 
tured. Illumina HiSeq2000 sequencing was employed, 
which yielded over 12 million reads that could be mapped 
onto the N. equitans genome (Additional file 1). Analysis 
of the abundance of different RNA species verified that all 
expected different types of RNA species were detected in 
the RNA-Seq data, including rRNAs, tRNAs, tRNA half 
molecules, small RNAs (sRNAs) and crRNAs. Neverthe- 
less, only a surprisingly small amount of mature tRNAs 
was detected and sequencing reads (usually less than 1,000 
reads) mapped to fragments of tRNA genes. This observa- 
tion is thought to exemplify the challenges that highly 
structured and heavily modified RNA sequences pose for 
reverse transcriptase enzymes. Most obtained sequencing 
reads (approximately 8.5 million reads) mapped to the 
genes encoding the 5S, 16S and 23S rRNAs followed by 
several highly abundant C/D box sRNAs and crRNAs 
(Figure 1). The analysis of these two small RNA species is 
detailed below. Even though the RNA isolation approach 
was not set up to purify mRNAs, the depths of the 
obtained sequencing results allowed the mapping of abun- 
dant fragmented mRNA reads. Most mRNA fragment 
reads (approximately 119,000 reads) were mapped to the 
two genes NEQ300 (S-layer protein) and NEQ026 (protein 
with unknown function). Both gene products were also 
identified by whole-cell proteomics to be among the six 



most abundant N. equitans proteins [12]. The only identi- 
fiable homolog of NEQ026 is found in Thermofilum pen- 
dens (arCOG06945) and linked to a putative amino acid 
permease. This observation provides a starting point for 
the analysis of the highly abundant protein NEQ026 
potentially being involved in the uptake of amino acids 
from I. hospitalis. 

Maturation of tRNA molecules 

N. equitans was the first organism to be identified to gen- 
erate six tRNA isoacceptors via a traws-splicing reaction 
using tRNA half molecules. Earlier studies characterized 
the mature spliced tRNA molecules in the cell, yet only a 
single tRNA half transcript could be identified [5,17]. 
The depth of the available RNA-Seq data allowed the 
identification of all 11 tRNA halves in the N. equitans 
cell. The tRNA half molecules contain the sequence that 
will fold into the mature tRNA body and a GC-rich 
stretch that is complementary to a sequence only found 
in the matching tRNA half. These sequences are pro- 
posed to facilitate the identification of the matching 
halves upon which the tRNA body folds [5]. Subse- 
quently, the concerted action of a heterotetrameric 
splicing endonuclease and an RNA ligase generates 
fra«s-spliced mature tRNAs. The termini of all tRNA 
halves were identified (Figure 2). The 5'-termini of the 
tRNA precursors are more defined than their 3' termini 
and contain the purine residue required for the proper 
initiation of transcription. In most cases the tRNA pre- 
cursor transcripts do not extend beyond the GC-rich 
stretch. This region is only extended by two nucleotides 
(GC) for the 3' tRNA Met half, by one nucleotide (A) for 
the 3' tRNA His half and shortened by one nucleotide (A) 
for the 3' tRNA Lys half. In some cases, the obtained tRNA 
precursor sequences merge with the mRNA of the adja- 
cent gene. One example is the 5' tRNA Hls half that is 
located directly upstream of the gene encoding the valyl- 
tRNA synthetase and that defines its 5' untranslated 
region. It is tempting to speculate that this structured 
tRNA half might play a role in mRNA stability or regula- 
tion of valyl-tRNA synthetase synthesis. The 5' tRNA Glu 
(UUC) half and 3' tRNA™" half sequences extend into 
adjacent genes of different orientation, which might 
cause difficulties for proper termination signals. These 
sequences indicate that a defined 3' end might not always 
be a necessity for the assembly of tRNA halves into func- 
tional tRNA molecules. Finally, both 5' tRNA halves for 
the tRNAj Met and the tRNA His already contain the addi- 
tional -1 base that leads to an extended acceptor stem in 
the mature tRNA. 

Absence of RNase P 

Potential low abundance structural RNA molecules were 
searched in intergenic regions but the otherwise universal 
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Figure 1 RNA abundance in N. equitans lllumina HiSeq2000 sequencing reads mapped to the N. equitans reference genome (GenBank: 
NC_005213, 490885 bp) highlight the abundance of crRNAs and C/D box sRNAs. 



RNase P RNA molecule could not be detected, which is 
in agreement with previous studies [9]. Genomic rearran- 
gements compensate for the loss of RNase P and ensure 
that all tRNAs start with a purine residue directly at the 
transcription initiation site. The RNA-Seq data allowed 
us to analyze the 5' tRNA termini and verified the 
absence of leader sequences. One interesting example is 
tRNA Tyr , which requires a CI base for its recognition by 
the tyrosyl-tRNA synthetase. However, without RNase P, 



such tRNAs could not start with a pyrimidine residue 
and it was reported that an unsual G-l extension solves 
the need for both a CI base and a purine residue at the 
transcription start [9]. As this unique acceptor stem is 
direct evidence for the absence of RNase P, the RNA-Seq 
reads were mapped to (i) the tRNA gene containing an 
intron and (ii) the mature intron-less tRNA. While there 
were significantly less reads detected for the mature 
tRNA due to problems of reverse transcription of a fully 
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5'-half tRNA Mel precursor (35248 +) 
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5'-half tRNA Gln precursor (419191 -) 
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3'-half tRNA Gln precursor (356559 +) 
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Figure 2 Identification of tRNA half precursors for tRNA splicing in trans. The observed N. equitans genome coverage of lllumina HiSeq2000 
sequence reads pinpoints the termini of tRNA half precursors. Annotation of the tRNA precursors containing sequences that assemble into the final 
tRNA (boxed) and GC-rich reverse complementary stretches that identify the matching half (underlined) are taken from [1 7]. 



modified tRNA, the vast majority of all sequences that 
mapped to the tRNA Tyr locus did contain the G-l 
extension. 

Processing of CRISPR RNAs 

N. equitans contains two CRISPR clusters whose crRNAs 
were found to be highly abundant in the cell. It should be 
noted that both CRISPR clusters are annotated in the 
reverse orientation in popular CRISPR databases [18,19]. 
These crRNAs consist of individual spacer sequences 
that were shown in other organisms to correspond to 
viral or conjugative plasmid fragments that were incorpo- 
rated into the CRISPR during an earlier attack by this 
mobile genetic element and provide resistance against 



repeated attacks via base complementarity [15]. Viruses 
that attack N. equitans are not known. Nevertheless, the 
reduced genome still allows the presence of 41 different 
spacer sequences distributed onto two CRISPR clusters. 
The CRISPR clusters are transcribed and cleaved within 
identical 28 bp repeat sequences. Processed crRNAs were 
detected for all spacers and contained a 5'-terminal 8 
nucleotide tag containing the 3'-terminal repeat nucleo- 
tides ATTGAAAG that is usually generated by Cas6 clea- 
vage (Figure 3). The 3' ends are gradually shortened and 
suggest 3' terminal exonucleolytic degradation. The 
abundance of crRNAs varies drastically, with the highest 
abundance for spacer 2 of CRISPR cluster 1 and spacer 1 
of CRISPR cluster 2. This accumulation of crRNA at the 
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GTTTTTAAATAATAAGGTTCGTAATTGTAAft.TATAAGGGAAAAGGCTCGAATAGCCCTGTGGGTTTCTAATATATTAGAAAT 



Figure 3 Identification of CRISPR RNA processing, (a) The crRNA sequencing reads were mapped to the two N. equitans CRISPR clusters to 
determine the abundance of individual crRNAs. Processing occurs within the repeat elements, generating crRNAs with a 5'-terminal ATTGAAAG 
8 nucleotide tag (underlined) and gradual trimming of the 3'-terminal tag. Arrows indicate hotspots for trimming, (b) Sequencing reads 
upstream of the first repeat within the identical leader sequence of both CRISPR arrays indicate the transcription start site (+1) and a potential 
promoter TATA box (box A) element. 



5' terminus of the CRISPR cluster is in agreement with 
previous observations that spacers in close proximity to 
the promoter region within the leader region of the 
CRISPR cluster represent the most recent attacks [20]. 
The drastically reduced abundance of some spacers can 
be an effect of reduced stability of the crRNA or ineffi- 
cient pre-crRNA processing. In addition, recent work in 
Sulfolobus solfataricus identified a correlation between 
abundance and RNA folding propensity [21]. The low 
abundant crRNA 4 of CRISPR cluster 1 contains a spacer 
that is considerably longer than all other spacers in the 
cluster, which might pose challenges for the crRNA 
maturation machinery. Adjacent to both CRISPR clus- 
ters, a leader region was identified whose sequence is 
identical for 130 nucleotides upstream of the first repeat. 
Thus, both CRISPR clusters possess identical promoters 
and transcription starts in both cases at an adenosine 
residue 33 nucleotides upstream of the first repeat. The 
box A region of archaeal promoters is usually located at a 
fixed distance from the transcription start site [22] and a 



consensus 5'-TTTAAA-3' sequence was indeed identified 
-27 nucleotides upstream of the transcription start. Iden- 
tical promoter and leader regions explain the comparable 
rates of crRNA production from both CRISPR clusters 
and suggest a recent duplication event. The CRISPR/Cas 
system of N. equitans belongs to the recently defined 
type I systems with Cas3 (NEQ022) as the protein that 
catalyzes the degradation of the viral DNA [23,24]. All 
universal proteins Casl (NEQ017), Cas2 (NEQ016) and 
Cas4 (NEQ021) proposed to be involved in the integra- 
tion of spacers are present. Finally, NEQ018, Cas7 
(NEQ019) and Cas5 (NEQ020) are proposed to form the 
Cascade complex that delivers the crRNAs to the DNA 
target. A Cas6 enzyme cannot easily be identified, which 
might indicate its sequence divergence from known Cas6 
enzymes. 

Identification of C/D box and H/ACA box sRNAs 

N. equitans maintains a large arsenal of RNA modifica- 
tion enzymes, and during genome annotation, 14 small 
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nucleolar (sno)-like RNAs that serve as guide RNAs for 
the methylation of rRNAs were described [3], The data 
showed that these snoRNAs (C/D box sRNAs in 
archaea) are highly abundant in the cell (Figure 1). 
Furthermore, the sequences for 26 C/D box sRNAs 
could be identified (Figure 4). All 14 C/D box sRNAs 
predicted by the snoscan algorithm [25] were verified 
and five predictions classified as 'questionable' partly 
overlapped with the identified transcripts. Seven addi- 
tional C/D box sRNAs were identified for which compu- 
tational predictions failed. The alignment of all 
identified C/D box sRNAs highlights their compact and 
conserved arrangement as well as the characteristic box 
C/D and box C7D' sequence motifs (Figure 5). The 5' 
and 3' ends of these RNAs are not base-paired. Putative 
2'O-ribose-methylation targets were predicted by the 
algorithm PLEXY [26] (Additional file 2). The proteins 
fibrillarin (NEQ125) and NOP56 (NEQ342), which 
associate with these C/D box sRNAs and guide methyla- 
tion, are present in the cell. The set of archaeal methyl- 
transferases include an enzyme that is only found in 
Thermococcales and N. equitans and uses S-adenosyl- 
methionine as a cofactor to catalyze the formation of 
5-methyl uridine in tRNAs and rRNAs [27]. A newly 
identified C/D box sRNA (sRNA2) overlaps with the 
gene for this methyltransferase (NEQ053), which might 
indicate their functional association. Three further C/D 
box sRNA genes overlap with genes for putative rRNA 



methyltransferases (sRNA5, sRNAll, sRNA17) and one 
gene (sRNA25) overlaps with a putative rRNA pseu- 
douridine synthase. 

The C/D box sRNAs26 contains an inverted order of 
the conserved boxes with the box D upstream of box C. 
This observation could be an effect of circular box C/D 
evolution that has been recognized in, for example, 
Pyrococcus furiosus [28]. 

A single 159 nucleotide long H/ACA box sRNA was 
identified in the intergenic region between the genes 
NEQ389 (tyrosyl-tRNA synthetase) and NEQ392 (hypothe- 
tical protein). A H/ACA box sRNA guides the pseudouri- 
dylation of rRNA targets. The identified compact H/ACA 
box sRNA contains two extended hairpins that each con- 
tain a bulge with kink-turn (k-turn) bulges (Figure 5). Simi- 
lar bulges were found in pseudouridine guide RNAs in 
Pyrobaculum [29]. A proper H domain (ANANNA) is not 
observed as the two hairpins are separated only by the 
sequence ACA in the hinge region (Figure 5). Potential 
pseudouridylation targets within the 16S rRNA and 23S 
rRNA were determined by the RNAsnoop algorithm [30] 
(Additional file 2). A stable RNA was identified to be 
encoded in the 66 nucleotide intron of the elongator 
tRNA Met . This tRNA is one of four intron-containing 
tRNAs but the other three introns are considerably shorter. 
The RNA contains elements that suggest its potential role 
as a C/D box sRNA potentially guiding RNA methylation 
(Figure 6). The introns of tRNAs are usually rather 
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Figure 4 Identification and alignment of C/D box sRNAs. Twenty-six C/D box sRNAs, each covered by over 3,000 mapped sequencing reads 
(Additional file 2) were identified. All sRNAs contain conserved box C/D and box C/D' elements. The permuted sRNA24 is detailed in Figure 7. 



Randau Genome Biology 2012, 13:R63 
http://genomebiology.com/201 2/1 3/7/R63 



Page 7 of 1 1 




1 10 ACA ACA 160 

Figure 5 Structure prediction for a dicistronic tRNA-C/D box sRNA and a H/ACA box sRNA. (a) The most abundant C/D box sRNA, sRNA8, 
is located immediately downstream of tRNA Val (TAC). The tRNA secondary structure was obtained with tRNA-Scan-SE [41] and a possible C/D box 
sRNA8 secondary structure was computed with mfold [40]. The box C/D and box C/D' elements and the proposed RNase Z cleavage site are 
indicated, (b) Possible secondary structure (mfold [40]) for the single identified H/ACA box sRNA. Two ACA motifs and two identical k-turn 
bulges with potential GA-GA pairs are indicated. 



unstable products and previously only tRNA Trp species 
were known to contain a functional C/D box sRNA in few 
archaea [31,32]. The N. equitans tRNA Met intron also con- 
tains a postulated stable hairpin with six consecutive GC 
base pairings. As split tRNA genes might have been a result 
of the split of intron containing tRNAs, such a structure 
could hint at the origin of the GC-rich complementary 
sequences at the borders of fra«s-spliced tRNA halves. 



Mobile C/D box sRNAs 

Alignment of the DNA stretches upstream and down- 
stream of the identified small RNA termini enables the 
analysis of potential promoter and terminator elements. 
Previously, the conserved elements of nanoarchael tRNA 
and tRNA half gene promoters were identified [9]. The 
promoters contain a clearly identifiable box A motif (5'- 
TTTAAA-3') 26 nucleotides upstream of the transcription 
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start and the terminator contains a stretch of polypyrimi- 
dines (T-stretch) downstream of the tRNA gene. Both ele- 
ments are described to be commonly employed by the 
archaeal RNA transcription machinery [22,33] and can 
also be found for the H/ACA box sRNA. Transcription 
and processing of C/D box sRNAs is more diverse. Some 
C/D box sRNAs (sRNA12, 13, 15, 16, 21) contain their 
own promoter and termination signals and transcription 
starts with a purine residue. However, most C/D box 
sRNAs do not contain easily identifiable promoters or 
start with a pyrimidine residue, which indicates that they 
are processed during maturation. Interestingly, potential 
dicistronic tRNA-sRNA precursors were identified. The 
gene for the most abundant C/D box sRNA, sRNA8, lies 
immediately downstream of the gene for tRNA Val . There- 
fore, 3' processing of the sRNA8-tRNA Val precursor by 
RNase Z (NEQ064) automatically generates the 5' termi- 
nus of C/D box sRNA (Figure 5). This is, to the best of my 
knowledge, the first time that this processing activity 
has been observed in prokaryotes as tRNA-snoRNAs 
were previously thought to be unique to plants [34]. 



Furthermore, two tRNA Gly isoacceptors (tRNAGly(CCC), 
tRNAGly(TCC)) are located adjacent to the C/D box 
sRNAs3 and sRNA14. 

Most C/D box sRNAs overlap with mRNA sequences 
(Additional file 2). One notable feature is the occurrence 
of C/D box sRNA genes at the borders of split genes. 
They are found adjacent to NEQ156 (RNA polymerase 
subunit B, carboxy-terminal part), NEQ096/NEQ097 
(hypothetical protein, carboxy-terminal part), NEQ124 
(archaeosine tRNA-guanine transglycosylase, amino- 
terminal part) and NEQ434 (reverse gyrase, amino-term- 
inal part) (Figure 7a). NEQ157, the gene located at the 
position where the amino-terminal portion of RNA poly- 
merase subunit B would be expected for a continuous 
gene, is flanked by two C/D box sRNA genes. In addition, 
the amino-terminal portion of the reverse gyrase is also 
flanked by two C/D box sRNAs. In eukaryotes, snoRNAs 
are recognized as mobile genetic elements [35] that often 
associate with introns. In archaea and potentially in 
N. equitans, C/D box sRNA can be located within tRNA 
introns [36]. Future research will need to determine the 
basis for C/D box sRNA mobility in Archaea. The obser- 
vation that one of the two C/D box sRNAs that flank the 
reverse gyrase half gene is permuted (Figure 7a) hints at 
a circularized intermediate structure. An inverse RT-PCR 
approach was employed to search for circular C/D box 
sRNAs. Both the C/D box sRNA23 and the permuted 
C/D box sRNA24 were detected as circular molecules in 
the cell (Figure 7b). Only the circular arrangement of the 
permuted C/D box sRNA24 allows the correct, conserved 
order of the C and D box elements and might therefore 
guarantee functionality. This phenomenon needs to be 
considered for automated genome annotation procedures 
of C/D box sRNAs. Genome rearrangements that create 
permuted C/D box sRNAs might only be detectable for 
recent fragmentation events as inactive C/D box sRNA 
genes will quickly accumulate mutations. Taken together, 
these observations strongly suggest an involvement of 
C/D box sRNA sequences in the splitting of both tRNA 
and protein encoding genes in the fragmented genome of 
N. equitans. 

Conclusions 

In the presented study over 12 million RNA sequence 
reads were mapped to the minimal 0.49 million bp gen- 
ome of N. equitans. The resulting sequencing depth 
allowed the detection of all predicted tRNA half precur- 
sors of the tRNA ir««s-splicing pathway. In addition, 
further evidence for the currently unique absence of 
RNase P in this organism was obtained. The analysis of 
the abundant small RNA population identified a consid- 
erable fraction of crRNAs as well as C/D box and 
H/ACA box sRNAs. These findings underline the impor- 
tance of these two RNA fractions for an organism that 
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SRNA23 



5' box C 



box D" 



box C 



boxD 



GTTTTAAAAAATAGCT ATGAAGAAAGCCGCTACTCTGAAT 



TTATGATGAAGAGCCTGGATGCAGATCTATTCCTA 



NEQ434 reverse gyrase fragment 



iiSCTi TTAACAATAACTGAGAGCAATATTAAAAATGCTGAACATCGGGTATACTGAATAGTGATGATTAGCTGTfliA 



SRNA24 5' boxD 



box C 



boxD' 



box C 



3' 



circular sRNA23 

box C 



box D 



box C 



box D' 



5'- CTGAATTTATGATGAAGAGCCTGGATGCAG ATCTATAGCTAT GAAGAAAGCCGCTACTCTGAATTTATGATG -3' 

Joined Termini 



Forward Primer 
circular sRNA24 



Reverse Primer 



box D' 



box C 



box D 



box C 



5'- GCTGAACATCGGGTATACTGAATAGTGATG ATTAGCTGTAAA CTGAGAGCAATATTAAAAATGCTGAACATCGG -3' 

Joined Termini Reverse Primer 



Forward Primer 



Figure 7 C/D box sRNAs flank a split reverse gyrase gene, (a) The gene for the amino-terminal portion of the split reverse gyrase (NEQ434, 
black) is flanked by two box C/D sRNAs. Box C and box D elements (boxed) are indicated to highlight the permutation of the sRNA24. (b) 
Sequencing of inverse RT-PCR amplificates with the indicated (underlined) outward facing primers revealed circular C/D sRNAs23 and 24. The 
detected sites of circularization indicate that circular C/D sRNAs are slightly larger than the average length of the linear molecules amplified by 
RNA-Seq methodology. 



lost most essential pathways for the synthesis of amino 
acids, nucleotides, cofactors, and lipids. It seems plausible 
that an organism that relies on the import of nucleotides 
would require their usage to be constrained. Neverthe- 
less, crRNAs are abundant in the cell, which is in con- 
trast to the silenced expression of CRISPR clusters found, 
for example, in some bacteria [37,38]. N. equitans 
appears to require the constant expression of this inter- 
ference system against the attack of mobile genetic ele- 
ments even though viruses of N. equitans are yet to be 
discovered. The abundant C/D box and H/ACA box 
sRNA fraction showcases the importance of rRNA pro- 
cessing events for N. equitans that is mirrored in its large 
set of RNA processing enzymes. These ribonucleoprotein 
complexes are suggested to ensure proper modification 
(and processing) of rRNAs in hyperthermophilic growth 
conditions. In conclusion, analysis of the RNA fractions 
in the minimal N. equitans cell revealed the loss or 
degeneration of universal RNA molecules (tRNAs, RNase 
P) while other, seemingly less essential RNA species 
(crRNAs, C/D box and H/ACA box sRNAs) are found to 
be highly abundant. The identification of C/D box 
sRNAs adjacent to split protein encoding genes and 
tRNAs as well as within a tRNA intron suggests their 
involvement in genome fragmentation. 

Materials and methods 

Cell cultivation and RNA isolation 

N. equitans Kin4-M cells were a kind gift of D Soil. The 
organism was grown in the Archaeenzentrum Regensburg 
(H Huber, M Thomm, K Stetter) in a 300 liter fermenter 



in simultaneous culture with /. hospitalis KIN4-I and puri- 
fied by gradient centrifugation as described [1]. Total RNA 
was isolated by SDS-lysis of the cell pellet and phenol/ 
chloroform extraction as described [5] and small RNAs 
were purified from total RNA using the MirVana RNA 
extraction kit (Ambion (Paisley, UK). 

RNA-Sequencing 

N. equitantsll. hospitalis RNA (3 ug) was treated with T4 
polynucleotidekinase to ensure proper termini for liga- 
tion. A protocol for the dephosphorylation of 2'3' cyclic 
phosphate termini was modified from [39]: 1 ug of RNA 
was incubated at 37°C for 6 hours with 10 units T4 poly- 
nucleotidekinase and 10 ul 5 x T4 polynucleotidekinase 
buffer (NEB, Ipswich, MA, USA) in a total volume of 
50 ul. Subsequently, 1 mM ATP was added and the reac- 
tion mixture was incubated for 1 hour at 37°C to gener- 
ate monophosphorylated 5' termini. RNA libraries were 
prepared with an Illumina TruSeq RNA Sample Prep Kit 
and sequencing on an Illumina HiSeq2000 sequencer was 
performed at the Max-Planck Genomecentre, Cologne 
(Max Planck Institute for Plant Breeding Research, Koln, 
Germany). 

Identification of small RNA species 

Sequencing reads were trimmed by (i) removal of Illumina 
TruSeq linkers and poly-A tails, and (ii) removal of 
sequences using a quality score limit of 0.05. A total of 
16,614,433 reads with an average length of 62.3 nucleo- 
tides were obtained after trimming. Of these, 626,555 
reads below 15 nucleotides were removed, and 12,178,737 
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reads were mapped to the N. equitans reference genome 
(GenBank: NC_005213) with CLC Genomics Workbench 
5.0 (CLC Bio, Aarhus, Denmark). The following mapping 
parameters were employed: mismatch cost, 2; insertion 
cost, 3; deletion cost, 3; length fraction, 0.5; similarity, 0.8). 
This program was also utilized to determine the coverage 
of individual RNA molecules. All predicted RNA mole- 
cules and their termini were manually verified and all 
intergenic regions were checked for the presence of RNA 
molecules with coverage of less than 1,000 reads. The fol- 
lowing algorithms were used for the computational analy- 
sis of the data: RNA folding (Mfold [40]), tRNA gene 
prediction (tRNAScan-SE [41]), snoRNA gene prediction 
(snoscan [25]), C/D box sRNA target prediction (plexy 
[26]), H/ACA box sRNA target prediction (RNAsnoop 
[30]), crRNA identification (crisprdb [18]), RNA align- 
ments (ClustalW2 [42]), RNA visualization (VARNA [43]). 
Gene annotations were obtained from GenBank and 
tRNA annotations were taken from [17]. 

Inverse RT-PCR 

Circular C/D box sRNAs 23 and 24 were amplified from 
the small RNA purification sample via the Thermoscript 
RT-PCR system (Invitrogen (Paisley, UK) with Thermo- 
script reverse transcriptase and Platinum Taq DNA poly- 
merase according to the manufacturer's instructions. The 
RNA was denatured at 100°C for 5 minutes and snap- 
cooled on ice for 5 minutes to facilitate reverse transcrip- 
tion at 70°C through potential secondary structures of the 
RNA. The following oligonucleotides were employed: 
sRNA23For, 5'-CTGAATTTATGATGAAGAGCCTG- 
GATGCAG-3'; sRNA23Rev: 5'- CATCATAAATTCA- 
GAGTAGCGGCTTTCTTC-3'; sRNA24For, 5'- GCTG 
AACATCGGGTATACTGAATAGTGATG-3'; sRNA24 
Rev, 5'- CCGATGTTCAGCATTTTTAATATTGCTCT- 
CAG-3'. The oligonucleotides partly overlap to ensure 
proper annealing to the sRNA template. PCR amplificates 
were cloned into a pCR2.1 TOPO vector (Invitrogen) and 
subjected to DNA sequencing (Eurofins MWG Operon 
(Ebersberg, Germany). 

Data availability 

The RNA-Seq data are available at NCBI's Gene Expres- 
sion Omnibus (GEO) website as series GSE38821. 

Additional material 



sRNAs. Indicated are genomic location, genomic context, termini, box C 
and box D structures and potential sites of action. 
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